Back

Swiss Medical Weekly

SMW Supporting Association

Preprints posted in the last 90 days, ranked by how well they match Swiss Medical Weekly's content profile, based on 12 papers previously published here. The average preprint has a 0.01% match score for this journal, so anything above that is already an above-average fit.

1
A continental-scale scenario modelling framework for evaluating infant RSV immunisation strategies across Europe

Viola, E.; Mazzoli, M.; Paolotti, D.; Rizzo, A.; Zino, L.; Gozzi, N.

2026-06-11 epidemiology 10.64898/2026.06.10.26355338 medRxiv
Top 0.1%
3.6%
Show abstract

Background. The recent approval of long-acting monoclonal antibodies (la-mAbs) and a maternal vaccine (MV) in the EU enables universal RSV prevention in infants. Modelling studies are widely used to quantify the population-level impact of alternative immunisation strategies. However, existing assessments of new RSV immunisation products focus on national or sub-national settings. Methods. We developed an age-stratified, stochastic compartmental model of RSV transmission for 28 EU/EEA countries. It combines literature-based parameters on RSV natural history and product efficacy with country-specific demographic and contact patterns. After model calibration against age- and country-specific RSV hospitalisation rates, we designed scenarios for both la-mAbs and MV at four coverage levels, with and without catch-up immunisation for infants under six months at season onset. We then evaluated each scenario against a no-immunisation baseline. Results. At 95% coverage, the cross-country median reduction in RSV hospitalisations over one season in infants under 12 months is 29.9% for la-mAbs (country median range: 27.7-33.9%) and 22.4% for MV (20.0-25.6%), scaling linearly with coverage. Out of all averted hospitalisations, 78.3% (90% CI: [67.3, 92.7]%) are concentrated in infants aged 0-2 months for la-mAbs and 72.7% (90\% CI: [61.4, 88.6]%) for MV. A catch-up campaign nearly doubles the overall reduction in RSV hospitalisations. Conclusions. Despite country-specific heterogeneities, impact of la-mAbs and MV is comparable across settings and herd-immunity effects are largely negligible. This supports harmonised European guidelines on coverage targets. Seasonal catch-up campaigns emerge as an effective lever to maximise the impact of immunisation programmes.

2
SEIR-IoT cyber-physical architecture with dual parametric coupling for epidemic scenario simulation using synthetic biomedical signals

Martinez Campo, S. D.; Campo-Ariza, F. M.; Martinez Campo, J. A.; Cormane, M.

2026-05-10 epidemiology 10.64898/2026.05.06.26352603 medRxiv
Top 0.1%
2.5%
Show abstract

This study presents a proof-of-concept cyber-physical architecture integrating a SEIR epidemiological model (Susceptible-Exposed-Infectious-Recovered), implemented in MATLAB, with a simulated Internet of Things (IoT) acquisition and transmission stage based on the ESP32 microcontroller and the ThingSpeak platform. The system generates synthetic biomedical signals of body temperature and peripheral oxygen saturation (SpO2), structured across three levels: circadian variation, scheduled pathological episodes, and Gaussian noise. These signals feed a dual parametric coupling function that dynamically updates the SEIR transmission parameter as a combined function of body temperature and oxygen saturation deviations from their clinical reference values. The proposed architecture is organized into four functional phases: measurement, communication, computational processing, and feedback. Five simulated clinical scenarios were evaluated, ranging from normal conditions (T = 36.5 {degrees}C, SpO2 = 97%) to fever with severe hypoxia (T = 38.5 {degrees}C, SpO2 = 88%), yielding basic reproduction number (R0) values between 4.20 and 5.38, and peak infected proportions between 29.9% and 35.2% of the simulated population (N = 1,000). A sensitivity analysis on the coupling coefficients, with {+/-}50% variation from nominal values, showed that the oxygen saturation coefficient is the most influential parameter on R0 (range = 0.76) compared to the thermal coefficient (range = 0.42), with monotonic and predictable behavior across the entire evaluated parametric space. The primary contribution of this work is system integration: we propose a reproducible platform connecting biomedical simulation, IoT communication, and epidemiological modeling through parametric coupling in a controlled environment. All data used are entirely synthetic; a retrospective calibration with real Colombian data from the first epidemic wave of 2020 confirmed the epidemiological consistency of the model, with a calibrated R0 of 1.85 and a Pearson correlation of 0.930. Results should be interpreted as evidence of architectural feasibility, not as clinical or epidemiological validation. Author SummaryThe COVID-19 pandemic made it clear that epidemiological surveillance systems need tools that combine accessible technology with mathematical models capable of anticipating disease spread. In this work, we built a proof-of-concept platform connecting three elements: a low-cost electronic sensor based on the ESP32 microcontroller, a cloud communication platform (ThingSpeak), and a mathematical model that simulates how an epidemic spreads through a population. The sensor generates synthetic data on body temperature and oxygen saturation that, through a mathematical formula we designed, dynamically modify the rate of contagion in the model. We evaluated five clinical scenarios, ranging from normal conditions to fever with severe hypoxia, and analyzed how sensitive the results are to changes in the system parameters. We found that oxygen saturation has a greater influence on the estimated contagion potential than body temperature. Although all data are synthetic, this platform demonstrates that it is possible to integrate low-cost sensors with epidemiological models in real time, opening a viable pathway for early warning systems in resource-limited settings.

3
From naive to foundation: benchmarking models for epidemic forecasting

Wang, D.; Li, Y.; Perra, N.

2026-05-13 epidemiology 10.64898/2026.05.11.26352889 medRxiv
Top 0.1%
2.4%
Show abstract

We systematically evaluate and compare the performance of classical statistical methods (ARIMA), mechanistic compartmental models (SEIR), modern deep learning architectures (LSTM, DLinear, Autoformer), and an emerging time-series foundation model (TabPFN-TS) to forecasts the incidence of Influenza-Like Illness (ILI) across nine European countries. The models are benchmarked against a naive baseline and a multi-model ensemble (RespiCast) created by an initiative of the ECDC. In line with the operational practice of existing forecasting hubs, our entire evaluation is explicitly optimized for short-term horizons (1 to 4 weeks ahead). Interestingly, we found that the foundation model TabPFN-TS allows for great zero-shot inference capabilities. Without any task-specific retraining, it successfully overcomes extreme data scarcity to consistently outperform all other individual architectures, frequently rivalling or surpassing the RespiCast ensemble. Our results highlight how deep learning architectures are severely constrained by extreme data scarcity, typical in epidemic forecasting, requiring targeted endogenous data augmentation to reduce predictive errors. Within the deep learning class of models, we observe that simpler architectures (such as DLinear and LSTM) frequently exhibit greater robustness and outperform complex, attention-based models (such as Autoformer) when data is constrained. Finally, our results show how a weighted ensemble, constructed by fusing all the models, delivers highly robust forecasts in all regions considered. Overall, our findings showcase the transformative potential of zero-shot foundation models in epidemic forecasting and confirm the importance of multi-model ensembles.

4
Estimation of hospital catchment populations using data on patient hospital use in France

Shirreff, G.; Chauvel, C.; Casalegno, J.-S.; Vanhems, P.; Dananche, C.; Redjaline, A.; Tazarourte, K.; Nunes, M.

2026-04-29 epidemiology 10.64898/2026.04.28.26351911 medRxiv
Top 0.1%
1.9%
Show abstract

BackgroundEstimates of disease burden from hospital data require well-informed estimates of the size of the catchment population. Data on patient flows from residential areas to a hospital can be used to estimate detailed catchment populations by age, year and type of hospital visit. MethodsCatchment populations were estimated for hospitals throughout France using a proportional flow approach. Data on hospital use and patient residence were accessed from the Agence Technique de lInformation sur lHospitalisation (ATIH). For patients coming from each administrative area, we calculated a preference for each hospital, and combined this with population data for the area to estimate the catchment population of each hospital. For one hospital group, we compared this with data on emergency visits, and data from a retrospective cohort study. ResultsEstimated catchment population by hospital group ranged from 4 million per year for Assistance Publique - Hopitaux de Paris (AP-HP) downwards, with the catchment population strongly reflecting geographic proximity and hospital scale. The type of hospital substantially impacted the size of the catchment area. In the analysis of a single hospital group, the size of the catchment population varied widely with the diagnostic categories associated with the hospital visit. Emergency visits represented a smaller catchment population. The estimated proportional contribution of different departments to the estimated catchment population was similar to their contribution to observed hospital admissions. Incidence rates for a respiratory virus using this catchment population estimation method were consistent with national incidence rates. ConclusionsThis study demonstrates the consistency of the proportional flow framework when applied to appropriate data on hospital usage. The study provides catchment populations for each hospital in France which can be used for burden estimates such as incidence rates, as well as providing insight into the catchment populations served. Analysis at the department geographic level provided an appropriate balance between detail of analysis and the need to mask data for anonymisation. Further analysis should explore how the size of the catchment area corresponds to the associated travel time to the hospital in question.

5
Who infected the reported cases? Evidence from 678,482 COVID-19 cases with identified infector collected in routine surveillance in the Netherlands, 2020-2022.

Backer, J. A.; Leung, K. Y.; Andeweg, S. P.; Van de Kassteele, J.; Veldhuijzen, I.; Hahne, S.; Wallinga, J.

2026-05-17 epidemiology 10.64898/2026.05.15.26347859 medRxiv
Top 0.1%
1.7%
Show abstract

Background During infectious disease outbreaks, characteristics of reported cases are routinely collected. These give information on becoming infected but not on infecting others. We assess whether linking infectees to infectors, together with their characteristics, can help understand transmission. Methods From the start of the COVID-19 pandemic in the Netherlands, reported cases were asked to identify their most probable infector in routine surveillance, enabling the linking of cases. We assess for the period 27 February 2020 - 11 April 2022 whether the infectees of these transmission pairs are representative of all reported cases, whether the transmission pairs yield verifiable estimates of epidemiological characteristics (here the serial interval), and whether they provide information on transmission that cannot be obtained otherwise. Results Of 8,003,008 reported cases, 678,482 (8.5%) could be linked to their most probable infector. These infectees were largely representative of the reported cases regarding age group, sex, and geographical location. The mean serial interval of 3.6 days (sd 3.4 days) from transmission pairs aligns with literature. Transmissions between age groups largely follow known contact patterns. Most transmissions in September 2021 occurred between persons who were not (fully) vaccinated, indicating the effectiveness of the vaccine, and relatively few between persons with different vaccination status, indicating assortative mixing in vaccination status. Conclusion Transmission pairs can be efficiently collected in routine surveillance, providing insight into disease transmission. The current post-pandemic period provides an excellent opportunity to adjust reporting systems for linking infectees to their most probable infector as preparation for future outbreaks.

6
Exploring emergency department attendance patterns during the UEFA European Football Championship 2024 in Germany

Charfeddine, N.; Schranz, M.; Schlump, C.; Rupprecht, M.; Ullrich, A.; Diercke, M.; AKTIN Research Group, ; Estupinan Mendez, J.

2026-06-09 epidemiology 10.64898/2026.06.08.26355151 medRxiv
Top 0.1%
1.7%
Show abstract

Background: Mass gathering events (MGEs) are associated with several public health challenges and may cause a strain on healthcare services. Literature findings on the impact of MGEs on emergency departments (EDs) are heterogeneous. Objectives: To examine shifts in ED attendance characteristics during a major sporting tournament, namely the UEFA European Football Championship 2024 held in Germany. Methods: We conducted a retrospective observational study using ED data from the Emergency Department Data Registry. We compared baseline ED attendance characteristics between the tournament and the reference period, defined as two weeks before and two weeks after the tournament, and between Germany game days and non-Germany game days. Hourly attendance patterns were analysed for all Germany games using a reference range. Results: We included data from 41 EDs, totalling 253,493 attendances during the study period. A 1.57% increase in attendance was observed during the tournament compared to the reference period, with baseline characteristics remaining similar. The median daily attendance within all EDs was slightly lower on Germany game days (4066) compared to non-Germany game days (4128). Modest changes were observed in the hourly attendance on Germany game days, most notable during the last Germany game where a decrease in attendance below the reference range extended over three hours. Conclusions: The observed shifts in ED attendance were minimal, suggesting that no major changes of public health relevance occurred in ED attendance during the tournament. We highlight the utility of using ED data for monitoring and for enhancing the understanding of the public health risks and challenges associated with MGEs.

7
Public attitudes toward sharing health data for artificial intelligence: Differences by data type and sector in the Health in Central Denmark cohort

Schaarup, J. R.; Isaksen, A. A.; Hulman, A.

2026-03-22 epidemiology 10.64898/2026.03.19.26348784 medRxiv
Top 0.1%
1.5%
Show abstract

AimsWe aimed to examine public perceptions of sharing various types of health data relevant for AI development, including electronic health records, audio recordings of consultations, medical images, and genetic information, with actors from either the public or the private sectors. MethodsWe analysed data from 38,740 participants of the Health in Central Denmark survey conducted in 2024. Participants were asked whether they would share different types of health data with an AI solution in healthcare. Each participant was randomised to either of two versions of the scenario and question where the AI application was developed in the public or private sector. Descriptive results (proportions and percentages) were weighted to represent the background population of approx. 1 million people in the Central Denmark Region. The association between randomization group (data recipient) and data sharing attitude ("Yes", "No", "Dont know") was analysed using multinomial logistic regression with "Dont know" as reference category. ResultsParticipants were most willing to share medical images (46%), followed by text from patient journals (39%), genetic information (35%), and audio recordings (27%). There were 12-16% higher proportions of willingness to share with public institutions than with private institutions. A high level of uncertainty was observed for all data types (29-36%) regardless of data recipient. Odds ratios ranged from 1.37 to 1.78 for responding "Yes", and from 0.51 to 0.67 for responding "No" to sharing data with public institutions compared to private institutions. ConclusionsPublic acceptance of health data sharing for AI depends on both the perceived sensitivity of the data and the institutional context of use. Strong public governance, transparent safeguards, and clear communication about data use may be important for maintaining trust and enabling responsible development of AI in healthcare.

8
Transdiagnostic Approach in Cerebral Palsy

Gates, P.; Chun, C. A.; Bonneau, L. C.; Soliman, D. A.

2026-04-28 orthopedics 10.64898/2026.04.27.26351832 medRxiv
Top 0.1%
1.3%
Show abstract

OBJECTIVESDemonstrate correlations of clinic-based measures of International Classification of Functioning, Disability and Health (ICF) Body Structure and Function, capacity and performance with a school-based performance measure in children with Cerebral Palsy (CP) using a transdiagnostic approach. METHODS102 ambulatory children with CP underwent assessment of Gross Motor Function Classification System (GMFCS), Gross Motor Function Measure (GMFM), Pediatric Quality of Life Inventory Generic Core Scales (PedsQL), 3-Dimensional Gait Analysis, Gillette Functional Assessment Questionnaire (GFAQ), and Pediatric Outcomes Data Collection Instrument (PODCI) done in clinics, compared with School Function Assessment (SFA) done in schools. Here we report on SFA correlations. For this paper, Spearmans correlations were calculated. RESULTSAll measures showed some significant correlations with the SFA; greatest number of moderate to strong correlations were with PODCI, including PODCI comorbidities scales. PODCI performance questionnaire was correlated with all SFA scales. PODCI, as a performance measure, is broader, more holistic, than the capacity and BSF measures. Findings are demonstrative of a focus on the ICF approach, indicating separate domains of function and well-being, reflective of the transdiagnostic approach. CONCLUSIONSThe transdiagnostic approach, looking at a broader picture than simply diagnosis, thus paralleling concepts presented in the ICF, is beneficial in assessing functioning and well-being in children with CP.

9
Trends in hospitalization rates for ocular diseases in Brazil

Dutra, I.; Soares, V. R.; Carvalho, L. M.

2026-05-21 epidemiology 10.64898/2026.05.18.26353540 medRxiv
Top 0.2%
1.2%
Show abstract

This study mapped the age- and region-specific risks of eye diseases in the Brazilian population, evaluating temporal trends and geographical inequalities in access to healthcare. Secondary data from DATASUS, covering the 27 Brazilian federative units from 2010 to 2024, were used, employing hierarchical negative binomial regression. A significant national increase in hospital admission rates was observed during the studied period, with increases of 160.8% for retinopathy, 126.4% for eye and appendage diseases, and 122.8% for glaucoma. State-level heterogeneity was extreme, with variations spanning from -93.1% to +3588% for glaucoma, for example. Even so, regional disparities were observed throughout the period; the South region reported an average 43.2% higher than the national average for retinopathies, and the Southeast 28.5% higher for eye and adnexal diseases, while the North region reported the lowest rates. Projections up to 2036 predict a further national increase of up to +377.0% for retinopathies, with interventions covering more than an order of magnitude. In addition to the temporal projection, rates in state, age, and year components on a logarithmic scale with calibrated uncertainty were verified. Out-of-sample tests show that the chosen modeling outperforms the last observed value maintenance method and naive linear extrapolation in all three diseases considered. Thus, the escalating, age-driven burden of ophthalmological diseases and profound geographic disparities highlight an urgent need to decentralize specialized care and target resource allocation within the public health system.

10
Characteristics of individuals with cerebral palsy across the United States

Aravamuthan, B. R.; Bailes, A. F.; Baird, M.; Bjornson, K.; Bowen, I.; Bowman, A.; Boyer, E.; Gelineau-Morel, R.; Glader, L.; Gross, P.; Hall, S.; Hurvitz, E.; Kruer, M. C.; Larrew, T.; Marupudi, N.; McPhee, P.; Nichols, S.; Noritz, G.; Oleszek, J.; Ramsey, J.; Raskin, J.; Riordan, H.; Rocque, B.; Shah, M.; Shore, B.; Shrader, M. W.; Spence, D.; Stevenson, C.; Thomas, S. P.; Trost, J.; Wisniewski, S.

2026-04-16 pediatrics 10.64898/2026.04.14.26350870 medRxiv
Top 0.2%
0.9%
Show abstract

ObjectiveCerebral palsy (CP) affects approximately 1 million Americans and 18 million individuals worldwide, yet contemporary US epidemiologic data remains limited. We aimed to use Cerebral Palsy Research Network (CPRN) clinical registry to describe demographics and clinical characteristics of individuals with CP across the US and determine associations with gross motor function and genetic etiology. MethodsRegistry subjects were included if they had clinician-confirmed CP and prospectively entered data for Gross Motor Function Classification System (GMFCS) Level, gestational age, genetic etiology, CP distribution, and tone/movement types. Logistic regression was used to determine which of these variables plus race, sex, ethnicity, and age were associated with GMFCS level and genetic etiology. ResultsA total of 9,756 children and adults with CP from 22 CPRN sites met inclusion criteria. Participants were predominantly White (73.0%), male (57.3%), non-Hispanic (87.8%), and younger than 18 years (73.7%). Most were classified as GMFCS levels I-III (55.6%), born preterm (52.8%), had spasticity (83.8%), and had quadriplegia (41.9%); 12.2% were identified as having a genetic etiology. Tone/movement types, CP distribution, and gestational age were significantly associated with both GMFCS level and genetic etiology (p<0.001). Compared to White individuals, Black individuals were more likely to have greater gross motor impairment (p<0.001). ConclusionIn this large US cohort, clinical and demographic factors, including race, were associated with gross motor function and genetic etiology in CP. These findings highlight persistent disparities and demonstrate the value of a national clinical registry for informing prognostication, quality improvement efforts, and targeted genetic testing strategies.

11
Robustly Quantifying Uncertainty in International Avian Influenza A(H5N1) Infection Fatality Ratios

Gada, L.; Afuleni, M. K.; Noble, M.; House, T.; Finnie, T.

2026-04-23 public and global health 10.64898/2026.04.22.26351373 medRxiv
Top 0.2%
0.9%
Show abstract

Knowing the mortality rates associated with infection by a pathogen is essential for effective preparedness and response. Here, harnessing the flexibility of a Bayesian approach, we produce an estimate of the Infection Fatality Ratio (IFR) for A(H5N1) conditional on explicit assumptions, and quantify the uncertainty thereof. We also apply the method to first-wave COVID-19 data up to March 2020, demonstrating the estimates that could be obtained were the model available then. Our analysis uses World Development Indicators (WDI) from the World Bank, the A(H5N1) WHO confirmed cases and deaths tracker by country (2003-2024), and COVID-19 cases and deaths data from John Hopkins University (January and February 2020). Since infectious disease dynamics are typically influenced by local socio-economic factors rather than political borders, individual countries are placed within clusters of countries sharing similar WDIs relevant to respiratory viral diseases, with clusters derived by performing Hierarchical Clustering. To estimate the IFR, we fit a Negative Binomial Bayesian Hierarchical Model for A(H5N1) and COVID-19 separately. We explicitly modelled key unobserved parameters with informative priors from expert opinion and literature. By modelling underreporting, our analysis suggests lower fatality (15.3%) compared to WHOs Case Fatality Ratio estimate (54%) on lab-confirmed cases. However, credible intervals are wide ([0.5%, 64.2%] 95% CrI). Therefore, good preparedness for a potential A(H5N1) pandemic implies adopting scenario planning under our central estimate, as well as for IFRs as high as 70%. Our approach also returns a COVID-19 IFR estimate of 2.8% with [2.5%, 3.1%] 95% CrI which is consistent with literature. Key MessagesO_LIWe adopted a disease-agnostic and adaptable Bayesian model, embedding scientific knowledge on A(H5N1) in the priors informed by published literature, to estimate the Infection Fatality Ratio (IFR) of avian influenza A(H5N1). C_LIO_LIAccounting for underreporting of cases and deaths, we estimate the IFR of avian influenza A(H5N1) at 15.3%, albeit with wide uncertainty ([0.5%, 64.2%] 95% Credible Intervals). C_LIO_LIDue to the uncertainty in the estimate, good preparedness for a potential A(H5N1) pandemic implies adopting scenario planning under our central estimate, as well as for IFRs as high as 70%. C_LI

12
Intervention and evaluation protocol of fit4future Kids: A multi-component health promotion programme in German primary schools

Sterr, K.; Blaschke, S.; Hess, D.; Lux, L.; Brandmeier, A.; Mess, F.

2026-05-26 public and global health 10.64898/2026.05.23.26353928 medRxiv
Top 0.2%
0.9%
Show abstract

Abstract Background: Schools are widely recognised as key settings for promoting childrens health behaviours. However, many schools struggle with the implementation and especially sustainment of health promotion programmes e.g. due to limited resources. Strengthening schools capacity for health promotion has therefore been identified as a central strategy for achieving better implementation and ultimately behaviour change outcomes among children. The fit4future Kids programme was developed as a large-scale, multi-component initiative in Germany that aims to promote childrens physical activity, nutrition, mental health, and responsible digital media use while simultaneously supporting schools in building structures for sustainable health promotion. Methods: This paper describes the intervention and evaluation protocol of the nationwide fit4future Kids programme implemented in several cohorts of German primary schools from Sept. 2022 to Sept. 2027. The intervention is based on the Health Promoting Schools framework and integrates established implementation and behaviour change frameworks, including the Consolidated Framework for Implementation Research, the COM-B model, and Behaviour Change Techniques. The programme combines curricular materials, environmental components, and structured implementation support to facilitate the integration of health promotion into everyday school practice. The evaluation follows a mixed-methods design involving multiple stakeholder groups, including school staff, parents, and children. Quantitative and qualitative data are collected to assess implementation processes, contextual factors, and programme outcomes. The large and diverse sample of 1,153 participating primary schools allows for the exploration of different implementation trajectories and the investigation of potential equity-related effects. Discussion: By combining evidence-based health promotion strategies with implementation science approaches, fit4future Kids provides a large-scale real-world example of how schools can be supported in implementing sustainable health promotion. The evaluation is expected to generate important insights into the implementation and potential effectiveness of multi-component school-based interventions and to inform future initiatives aiming to strengthen health-promoting school environments.

13
Episia: An Open-Source Python Library for Epidemiological Surveillance, Modeling, and Biostatistics in Resource-Limited Settings

Ouedraogo, F. A. S.

2026-04-20 epidemiology 10.64898/2026.04.17.26350337 medRxiv
Top 0.2%
0.9%
Show abstract

Despite the evolution of epidemiological analysis and modeling tools, difficulties still remain, especially in developing countries, regarding the availability and use of these tools. Often expensive, requiring high technical expertise, demanding constant connectivity of several or sometimes even significant resources, these tools, although efficient, present a major gap with the operational realities of health districts. It is in this context that we introduce Episia, an open-source Python library designed and conceived to provide a framework to facilitate epidemiological analysis and modeling. It integrates a suite of compartmental epidemic models (SIR, SEIR, SEIRD) with a sensitivity analysis using the Monte Carlo method, a complete biostatistics suite validated against the OpenEpi reference standard, as well as a native DHIS2 client for automated data ingestion. Developed in Burkina Faso, it is optimized and aims not only to address these health challenges encountered in Africa but also remains a versatile tool for global health informatics.

14
Data Resource Profile: EST-Health-30

Reisberg, S.; Oja, M.; Mooses, K.; Tamm, S.; Sild, A.; Talvik, H.-A.; Laur, S.; Kolde, R.; Vilo, J.

2026-04-24 epidemiology 10.64898/2026.04.21.26351087 medRxiv
Top 0.2%
0.9%
Show abstract

BackgroundThe increasing availability of routinely collected health data offers new opportunities for population-level research, yet access to comprehensive, linked, and standardised datasets remains limited. We describe EST-Health-30, a large-scale, population-representative health data resource from Estonia. MethodsEST-Health-30 comprises a random 30% sample of the Estonian population (~500,000 individuals), with longitudinal data from 2012 to 2024 and annual updates planned through 2026.Individual-level records are linked across five nationwide databases, including electronic health records, health insurance claims, prescription data, cancer registry, and cause of death records. A privacy-preserving hashing approach ensures consistent cohort inclusion over time while maintaining pseudonymisation. All data are harmonised to the Observational Medical Outcomes Partnership (OMOP) Common Data Model (version 5.4) using international standard vocabularies. Data quality was assessed using established OMOP-based validation frameworks. ResultsThe dataset contains rich multimodal information on diagnoses, procedures, laboratory measurements, prescriptions, free-text clinical notes, healthcare utilisation, and costs, with high population coverage and longitudinal depth. Data quality assessment showed high completeness and consistency, with 99.2% of applicable checks passing. The age-sex distribution closely reflects the national population, supporting representativeness, though coverage is marginally below the target 30% (29.2%), primarily attributable to recent immigrants without health system contact. The dataset enables construction of detailed clinical cohorts, analysis of disease trajectories, and evaluation of healthcare utilisation and outcomes across the life course. ConclusionsEST-Health-30 is a comprehensive, standardised, and population-representative real-world data resource that supports epidemiological, clinical, and methodological research. Its alignment with the OMOP CDM facilitates reproducible analytics and participation in international federated research networks, while secure access infrastructure ensures compliance with data protection regulations. Key featuresO_LIEST-Health-30 is a population-representative dataset of complete health records for a random 30% sample of the Estonian population (~500,000 individuals) spanning 2012-present, enabling population-level epidemiological analyses with annual updates. C_LIO_LIThe dataset is constructed using a random sampling approach based on hashed password-protected personal identifiers, ensuring consistent inclusion over time with unbiased population coverage. C_LIO_LIIndividual-level data are linked across multiple nationwide databases, including electronic health records, claims, prescriptions, cancer and cause of death registry data, enabling multimodal analyses of health trajectories. C_LIO_LIAll data are standardised to the OMOP Common Data Model (CDM) version 5.4 using international vocabularies (e.g., SNOMED CT, RxNorm, LOINC), supporting reproducibility and participation in federated research networks. C_LIO_LIThe dataset is accessible through a secure processing environment compliant with the European Health Data Space (EHDS) framework. C_LI

15
Predicting COVID-19 incidence from seroprevalence and population-based cohort data using interpretable machine learning with differential privacy analysis

Krepel, J.; Binkyte, R.; Kerkouche, R.; Harries, M.; Klett-Tammen, C. J.; Fritz, M.; Kesselheim, S.; Kuehn, M.; Bazarova, A.; Lange, B.

2026-04-02 epidemiology 10.64898/2026.04.01.26349876 medRxiv
Top 0.3%
0.8%
Show abstract

During the COVID-19 pandemic, reported incidence data played a central role in public health surveillance and in tracking epidemic dynamics, although they provide limited insight into the behavioral, immunological, and socioeconomic drivers of transmission.Population-based seroprevalence studies with linked survey data offer a rich but untapped source of individual-level information that can complement routine surveillance. In this study, we investigate whether aggregated seroprevalence cohort data can be leveraged to predict local COVID-19 incidence and to identify interpretable predictors associated with transmission dynamics. Using data from the Multilocal SeroPrevalence (MuSPAD) study in Germany (2020--2022), we trained multiple machine learning models, including least absolute shrinkage and selection operator (LASSO), vector autoregressive models (VAR), multilayer perceptrons (MLPs), and long short-term memory neural networks (LSTMs), to predict location-specific seven-day incidence rates. Feature importance was assessed using regression coefficients where applicable and model-agnostic explainability methods, including Local Interpretable Model-agnostic Explanations (LIME) and SHapley Additive exPlanations (SHAP). Across model classes, cohort-derived features enabled accurate prediction of local incidence, with time-aware models achieving the strongest performance. Consistent predictors included prior infection and testing history, employment-related changes, vaccination status, and mask-wearing behavior, highlighting the importance of behavioral and reporting-related signals. While differential privacy introduced modest degradation in predictive performance under strict privacy budgets, SHAP-based explanations remained stable, and LIME-based explanations were more sensitive to privacy-induced noise. These results demonstrate that aggregated cohort data encode meaningful and interpretable signals of population-level transmission dynamics. Population-based serosurveys therefore provide a complementary source of information for predicting local COVID-19 incidence and identifying key drivers of transmission beyond routine surveillance data. Our findings show that integrating interpretable machine learning with privacy-aware analysis enables actionable insights from sensitive cohort data, supporting their use in digital epidemiology and informing data-driven public health decision-making.

16
WELL-ED: Wellbeing and Education linkages in school-aged children - A protocol for a population-based register study and survey of adolescents

Kosola, S.; Salonen, S.; Miettinen, J.; Horhammer, I.; Impio, A.-R.; Kumpulainen, S. M.; Sergejeff, J.; Numari, S.; Laitinen-Parkkonen, P.; Tapola-Haapala, M.; Aaltio, E.; Thorn, L.

2026-06-08 public and global health 10.64898/2026.06.06.26355053 medRxiv
Top 0.3%
0.8%
Show abstract

Introduction Education is a core social determinant of health for children and adolescents. Unfortunately, academic achievement, health, and wellbeing of adolescents have decreased in many developed countries in the past decade. The purpose of the Wellbeing and Education linkages in school-aged children (WELL-ED) study is to examine associations of school absences and academic achievement with use of school-based and community-based health and social welfare services. In addition, we will assess user experiences and multi-sector services pathways of school-aged children for a better understanding of how the service system could respond to the needs of children. Methods and analysis WELL-ED is a large population-based study that combines register data on school absences and educational support from municipalities with register data on healthcare and social service use collected from wellbeing services counties in Finland. The study cohort includes all children who attended mandatory education in public schools in Southern Finland in school year 2023-2024. A smaller cohort of adolescents in school year 8 was invited to complete a user experience survey. The primary outcomes of this study are related to equity of service use. Ethics and dissemination The Regional Committee on Medical Research Ethics of the Helsinki and Uusimaa Hospital District (2803/2024) has approved the WELL-ED study protocol. For the survey, adolescents in year 8 and parents of adolescents younger than 15 provided informed consent. Results will be published in peer-reviewed journals, summaries will be sent to participating municipalities and wellbeing services counties and press releases will be written on key findings.

17
High-Sensitivity Radiation-Free Triage for Adolescent Idiopathic Scoliosis via 3D Point Cloud Geometry

Yang, J.; Shi, H.; Huang, Z.; Wang, X.; Wang, W.; Zhang, T.; Wang, J.; Zhan, Y.; Liu, H.; Zhang, Z.; Zhang, J.; Fei, Z.; Xuan, X.; Gao, Y.; Deng, Y.; Wang, L.; Liu, X.; Tian, L.; Zhang, Y.; Ai, L.; Yang, J.

2026-03-16 public and global health 10.64898/2026.02.11.26346069 medRxiv
Top 0.3%
0.8%
Show abstract

Widespread screening for Adolescent Idiopathic Scoliosis (AIS) is critical for early intervention, yet it is currently bottlenecked by the inherent limitations of traditional methods. Radiographic diagnosis poses cumulative radiation risks, while manual physical examinations are highly subjective and time-consuming. Recent non-invasive 2D computer vision approaches suffer from an unavoidable "dimensionality gap," failing to capture critical depth and rotational information, which frequently leads to diagnostic misjudgments. To address these challenges, we present PointScol, a high-sensitivity, radiation-free triage system leveraging direct geometric processing of 3D back surface point clouds. Our framework employs a sequential pipeline: first, an automated segmentation module rigorously standardizes the input geometry by isolating the dorsal region of interest; subsequently, a diagnostic classification module evaluates the spinal deformity. Validation on a multi-center dataset (n=128) demonstrated that for the primary screening task (10{degrees} Cobb angle threshold), PointScol achieved 100.00% sensitivity in the external cohort, acting as a reliable gatekeeper to safely rule out healthy individuals without missing any cases requiring referral. Building upon the robust accuracy established at this 10{degrees} baseline, an extended 5-class grading module provides further diagnostic value. Rather than functioning as a rigid predictive task, this multi-class stratification acts as an advanced clinical assistant, offering nuanced severity insights to guide referral urgency and optimize medical resource allocation for high-risk patients. Collectively, this sequential design establishes PointScol as a safe and highly efficient clinical filter: it reliably prevents unnecessary radiation exposure for healthy adolescents while ensuring prioritized interventions for those most in need.

18
Automated bioinformatic pipeline for unbiased detection of tuberculosis transmission clusters: Real-time impact and retrospective insights

Genestet, C.; Testard, Q.; Ben-Hassen, G.; Bardel, C.; Vallee, M.; Bourg, C.; Bahuaud, O.; Joannard, B.; Tatai, C.; Barabotti, S.; Ader, F.; Dananche, C.; Hodille, E.; Dumitrescu, O.

2026-03-19 epidemiology 10.64898/2026.03.16.26348245 medRxiv
Top 0.3%
0.7%
Show abstract

Background[RP1.1] In low-burden countries such as France, whole-genome sequencing (WGS) is increasingly integrated into routine tuberculosis (TB) surveillance to improve case management and transmission monitoring. However, applying WGS to all TB cases generates large volumes of data, requiring automated tools for timely interpretation and outbreak response. Methods Since November 2016, all clinical M. tuberculosis isolates diagnosed in eight hospitals from three cities of Auvergne-Rhone-Alpes in France have undergone WGS. In July 2023, an automated pipeline for anti-TB drug resistance prediction and unbiased detection of transmission clusters based on SNP distances was implemented. Epidemiological, microbiological and clinical data were collected, with contact duration classified as household, frequent, or occasional. Index cases were stratified by their level of extra-household transmission (EHT), and statistical analyses were performed to identify associated factors. Findings Among 1,152 TB patients diagnosed between 2016 and 2025, 75 clusters involving 247 patients (21.4%) were identified. WGS reliably detected resistance to first-line anti-TB drugs, leveraging the WHO mutation catalogue. Routine WGS enabled real-time alerts for TB control centres, leading to expanded field investigations, including community spillover, nosocomial transmissions, and school outbreak. Classical indicators of contagiousness (smear results, cavitary disease) were not associated with EHT level. Instead, lower TB severity indices and longer duration of symptoms were linked to higher EHT level. Interpretation Systematic WGS supports timely identification of drug resistance and transmission events and provides new insights into contagiousness factors. The automated pipeline enables direct interpretation by clinical microbiologists, facilitating real-time public health action. In this study, we demonstrate how, with the appropriate pipeline, WGS offered a time- and cost-effective solution for routine TB management. Funding This work was supported by SHAPE-Med@Lyon, a French government grant managed by the French National Research Agency under the France 2030 program (reference ANR-22-EXES-0012).

19
Availability and Quality of Anthropometric Data in Swiss Childrens Hospitals: The SwissPedGrowth Project

Leuenberger, L. M.; Shoman, Y.; Romero, F.; Deligianni, X.; Hartung, A.; Mozun, R.; Goebel, N.; Bielicki, J. A.; Burckhardt, M.-A.; Latzin, P.; Saner, C.; Posfay-Barbe, K. M.; Schwitzgebel, V.; Giannoni, E.; Hauschild, M.; Stocker, M.; Righini-Grunder, F.; Lauener, R.; Mueller, P.; Schlapbach, L. J.; Jenni, O. G.; Spycher, B. D.; Kuehni, C. E.; Belle, F. N.; for the SwissPedHealth Consortium,

2026-03-30 health informatics 10.64898/2026.03.27.26349493 medRxiv
Top 0.3%
0.7%
Show abstract

OBJECTIVE: Anthropometric data are critical in paediatric care, routinely assessed during clinical visits, and available in electronic health records (EHRs). We describe the feasibility of extracting anthropometric data from heterogeneous EHR systems of Swiss childrens hospitals, evaluate their availability and quality, and assess the cohorts representativeness of the general population. METHODS: In this multicentre study (SwissPedGrowth), we retrospectively collected EHRs from patients <20 years who visited hospitals in Basel, Bern, Geneva, Lausanne, Luzern, St. Gallen, or Zurich between 2017-2023. Sociodemographic, administrative, and clinical information from EHRs were provided in a standardized way by a paediatric national data stream (SwissPedHealth), including the Swiss Neighbourhood Index of Socioeconomic Position (Swiss-SEP). We counted anthropometric recordings per visit to describe availability and used a self-developed and an existing (growthcleanr) algorithm to investigate data quality. To assess representativeness, we compared sociodemographic characteristics between SwissPedGrowth and the general paediatric population in Switzerland, computed standardized differences (effect size: 0.2 small, 0.5 medium, 0.8 large), and weighted the study population to reduce differences. RESULTS: We included 477,531 patients and 2,171,633 hospital visits; 54% boys, 71% Swiss, mean Swiss-SEP 65 (SD: 11), and median age at visit 6.3 [IQR: 2.3, 11.8] years. Height recordings were available for 20% of the visits, weights for 43%, and head circumferences for 5%, with better availability for inpatient stays than outpatient or emergency visits. Combining the self-developed and existing algorithm, 4% of heights and 3% of weights were flagged as outliers and 29% of heights and 31% of weights as carried forward from previous visits or same day duplicates. Sociodemographic differences between SwissPedGrowth and the general population were small or small-to-medium and disappeared after weighting. CONCLUSION: SwissPedGrowth demonstrates feasibility of extracting high-quality anthropometric data for paediatric growth research, but challenges regarding completeness and harmonization of EHR data across Swiss hospitals remain.

20
Assessing the impact of a gender-neutral approach to HPV vaccination on vaccination coverage for nine-year-old girls in Cameroon: a retrospective, cross-sectional study

Griffith, B. C.; Iliassu, S.; Mbanga, C.; Ngenge, B. M.; Patel, S.; Graves, J. C.; Singh, N.; Ndoula, S.; Njoh, A. A.; Gisele, E.; Mngemane, S.; Ajayi, T.; Zultak, L. A.; Saidu, Y.

2026-04-11 public and global health 10.64898/2026.04.09.26350560 medRxiv
Top 0.3%
0.7%
Show abstract

Cameroon introduced Human papilloma virus vaccine (HPVV) into the routine immunization schedule in October 2020. By the end of 2022, coverage remained low. To increase coverage, Cameroon switched to a country-wide, gender-neutral vaccination (GNV) approach in 2023, coupled with a revamped delivery strategy consisting of Community Dialogues (CDs) and Periodic Intensification of Routine Immunization (PIRIs) activities in selected health districts (HDs). We assessed the impact of these programmatic changes, notably the GNV approach, on HPVV coverage. This retrospective, cross-sectional study measured the effect of GNV and CDs + PIRIs on HPVV coverage among 9-year-old girls in Cameroon (2022-2023). Data on HPVV coverage from all 203 HDs were extracted from DHIS2, and coverage was calculated at the HD level, based on the estimated population eligible of 9-year-old girls. Descriptive statistics and multiple regression models were employed to assess the impact of GNV on vaccination coverage while adjusting for CDs + PIRIs and urban/rural status. In 2023, of the 203 HDs, 115 (56.7%) conducted GNV only, 74 (36.5%) implemented GNV & CDs + PIRIs, and 75.9% (154) were classified as rural. Among age-eligible girls, there was an overall increase in HPV vaccination coverage, with coverage rising 39.2 percentage points from 2022 to 2023. Following multiple linear regression, there was a significant increase in HPVV coverage in HDs with GNV & CDs + PIRIs compared to those with no GNV and no CDs + PIRIs ({beta}:55.5%, 95%CI: 38.7, 72.3, p=0.000). Furthermore, there was a significant increase in HPVV coverage in HDs with GNV only compared to those with no GNV or no CDs + PIRIs ({beta}:28.7%, 95%CI: 12.5, 45.0 p=0.001). Overall, the GNV approach increased HPVV coverage for girls significantly, particularly when implemented alongside CDs + PIRIs.